Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

نویسندگان

  • Ke Jiang
  • Brian Kulis
  • Michael I. Jordan
چکیده

Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MAP for Exponential Family Dirichlet Process Mixture Models

The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian nonparametric model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibb’s sampling are required. As a result, DPM-based methods, which have considerable potential, are restricted to applications in which computational resources and time f...

متن کامل

Small Variance Asymptotics for Non-Parametric Online Robot Learning

Small variance asymptotics is emerging as a useful technique for inference in large scale Bayesian non-parametric mixture models. This paper analyses the online learning of robot manipulation tasks with Bayesian non-parametric mixture models under small variance asymptotics. The analysis yields a scalable online sequence clustering (SOSC) algorithm that is non-parametric in the number of cluste...

متن کامل

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

We study a Bayesian framework for density modeling with mixture of exponential family distributions. Our contributions: •A variational Bayesian solution for finite mixture models • Show that finite mixture models (with a Bayesian setting) can determine the mixture number automatically • Justify this result with connections to Dirichlet Process mixture models •A fast variational Bayesian solutio...

متن کامل

Online Inference in Bayesian Non-Parametric Mixture Models under Small Variance Asymptotics

Adapting statistical learning models online with large scale streaming data is a challenging problem. Bayesian non-parametric mixture models provide flexibility in model selection, however, their widespread use is limited by the computational overhead of existing sampling-based and variational techniques for inference. This paper analyses the online inference problem in Bayesian non-parametricm...

متن کامل

Hyperparameter estimation in Dirichlet process mixture models

In Bayesian density estimation and prediction using Dirichlet process mixtures of standard, exponential family distributions, the precision or total mass parameter of the mixing Dirichlet process is a critical hyperparameter that strongly influences resulting inferences about numbers of mixture components. This note shows how, with respect to a flexible class of prior distributions for this par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012